Skip to content

Fix JSON file corruption in save_payload_as_json when merge=True#97

Merged
jameshill merged 2 commits intobuildkite:mainfrom
altana-ai:fix/merge-json-write-outside-lock
Apr 9, 2026
Merged

Fix JSON file corruption in save_payload_as_json when merge=True#97
jameshill merged 2 commits intobuildkite:mainfrom
altana-ai:fix/merge-json-write-outside-lock

Conversation

@jasonwbarnett
Copy link
Copy Markdown
Contributor

@jasonwbarnett jasonwbarnett commented Mar 19, 2026

Summary

  • Bug: In save_payload_as_json, when merge=True, the file read happens inside the FileLock but the file write happens outside it. This creates a race condition where multiple concurrent processes can read the file, then all try to write simultaneously, resulting in JSON corruption or lost data.
  • Impact: This affects any environment where multiple pytest processes merge results into the same JSON file concurrently — for example, when pants runs pytest in parallel across multiple targets and each process merges its results into a shared output file.
  • Fix: Move the file write inside the FileLock context manager when merge=True, ensuring the read-modify-write cycle is atomic. Non-merge writes don't need the lock and are handled in a separate else branch.

Before (buggy)

if merge:
    lock = FileLock(f"{path}.lock")
    with lock:
        if os.path.exists(path):
            with open(path, "r", encoding="utf-8") as f:
                existing_data = json.load(f)
            data = existing_data + data

with open(path, "w", encoding="utf-8") as f:  # <-- OUTSIDE the lock!
    json.dump(data, f)

After (fixed)

if merge:
    lock = FileLock(f"{path}.lock")
    with lock:
        if os.path.exists(path):
            with open(path, "r", encoding="utf-8") as f:
                existing_data = json.load(f)
            data = existing_data + data
        with open(path, "w", encoding="utf-8") as f:  # <-- INSIDE the lock
            json.dump(data, f)
else:
    with open(path, "w", encoding="utf-8") as f:
        json.dump(data, f)

Test plan

  • All 5 existing save_json tests pass (test_save_json_payload_without_merge, test_save_json_payload_with_merge, test_save_json_payload_with_non_existent_file, test_save_json_payload_with_invalid_file, test_save_json_payload_with_large_data)
  • Verify in a concurrent environment (e.g., pants with parallel pytest targets) that the merged JSON output is no longer corrupted

Jason Barnett and others added 2 commits March 19, 2026 14:22
The file write was happening outside the FileLock when merge=True,
causing JSON corruption when multiple pytest processes (e.g., from
pants running tests in parallel) try to merge results into the same
file concurrently. Both the read and write must be inside the lock
to ensure atomicity.

Move the write inside the lock block when merge=True, and use an
else branch for non-merge writes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spawns 10 threads that simultaneously call save_payload_as_json with
merge=True on the same file. Verifies the result is valid JSON
containing all 10 entries.

This test fails with the buggy code (write outside lock) and passes
with the fix (write inside lock), confirming the race condition is
resolved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@jameshill jameshill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @jasonwbarnett — clean fix and good test coverage. The race condition in the read-modify-write cycle was a real one, especially for parallel pytest environments like pants. Appreciate you tracking it down and writing it up so clearly.

@jameshill jameshill merged commit 629d669 into buildkite:main Apr 9, 2026
11 checks passed
@jameshill jameshill mentioned this pull request Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants